首页> 外文OA文献 >Ensemble-Compression: A New Method for Parallel Training of Deep Neural Networks
【2h】

Ensemble-Compression: A New Method for Parallel Training of Deep Neural Networks

机译:集合压缩:一种新的深度神经网络并行训练方法   网络

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Parallelization framework has become a necessity to speed up the training ofdeep neural networks (DNN) recently. Such framework typically employs the ModelAverage approach, denoted as MA-DNN, in which parallel workers conductrespective training based on their own local data while the parameters of localmodels are periodically communicated and averaged to obtain a global modelwhich serves as the new start of local models. However, since DNN is a highlynon-convex model, averaging parameters cannot ensure that such global model canperform better than those local models. To tackle this problem, we introduce anew parallel training framework called Ensemble-Compression, denoted as EC-DNN.In this framework, we propose to aggregate the local models by ensemble, i.e.,averaging the outputs of local models instead of the parameters. As most ofprevalent loss functions are convex to the output of DNN, the performance ofensemble-based global model is guaranteed to be at least as good as the averageperformance of local models. However, a big challenge lies in the explosion ofmodel size since each round of ensemble can give rise to multiple times sizeincrement. Thus, we carry out model compression after each ensemble,specialized by a distillation based method in this paper, to reduce the size ofthe global model to be the same as the local ones. Our experimental resultsdemonstrate the prominent advantage of EC-DNN over MA-DNN in terms of bothaccuracy and speedup.
机译:最近,并行化框架已成为加快深度神经网络(DNN)训练的必要条件。这样的框架通常采用ModelAverage方法,称为MA-DNN,其中并行工作者根据他们自己的本地数据进行相应的培训,同时定期传递和平均本地模型的参数以获得全局模型,该全局模型用作本地模型的新起点。但是,由于DNN是高度非凸模型,因此平均参数无法确保此类全局模型的性能优于局部模型。为了解决这个问题,我们引入了一个新的并行训练框架Ensemble-Compression,称为EC-DNN。在这个框架中,我们建议通过集成来聚合局部模型,即对局部模型的输出而不是参数进行平均。由于大多数流行的损失函数对DNN的输出都是凸的,因此基于集成的全局模型的性能至少要与局部模型的平均性能一样好。但是,最大的挑战在于模型尺寸的爆炸式增长,因为每一轮合奏都可以使尺寸增大几倍。因此,我们在每个集合之后进行模型压缩,本文采用基于蒸馏的方法进行专业化,以将全局模型的大小减小为与局部模型相同。我们的实验结果证明,在准确性和速度方面,EC-DNN优于MA-DNN。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号